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Abstract. In a stochastic noise setting the Lepskij balancing principle for choosing 
the regularization parameter in the regularization of inverse problems is depending on 
a parameter t which in the currently known proofs is depending on the unknown noise 
level of the input data. However, in practice this parameter seems to be obsolete. 

We will present an explanation for this behavior by using a stochastic model for 
noise and initial data. Furthermore, we will prove that a small modification of the 
algorithm also improves the performance of the method, in both speed and accuracy. 
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1. Introduction 

In the following, we will consider linear inverse problems [EHN961 IHof86j given as an 
operator equation 

Ax = y, (1) 

where A : X — > y is a linear, continuous, compact operator acting between separable 
real infinite dimensional Hilbert spaces X, y. Without loss of generality we assume that 
A has a trivial null-space N(A) = {0}. A does not have a continuous inverse because 
A is compact and X is infinite dimensional, and hence ([T]) is ill-posed. 

For the analysis we will need the singular value decomposition of A. There exist 
orthonormal bases (wfc)fceiN of X and (v k )keN of y and a sequence of positive decreasing 
singular values ($k)kem such that 

oo 

Ax = ^s fc (x,u k )v k . (2) 

k=l 

Moreover, we assume that the data y are noisy, the noise model for £ will be specified 
later, in contrast to the classical considerations in a stochastic setting £ is not necessarily 
an element of y. 

y s = Ax + £, £ noise. (3) 

In order to counter the ill-posedness, we need to regularize; in this article we will 
concentrate on the regularization method truncated singular value decomposition 
(TSVD, also called spectral cut-off regularization) which has some specific features that 
make proofs considerably easier. The level n at which we truncate is called regularization 
parameter. The subsampling function s(-) : IN y IN is assumed to be strictly increasing. 

s{n) 

A^y 5 = x 5 n = J2 (fo u k ) + s k 1 (£, v k )) u k (4) 

k=l 

The unknown noise-free regularized solution is defined as 

s(n) 

A^y = x n = ^2(x,u k )u k (5) 

k=l 

The correct choice of the regularization parameter is of major importance for the 
performance of the method. In recent times, a number of articles [GP00[ IMP03[ |BP05, 
IMP061 IHPR071 IBHM09] have considered the Lepskij Balancing principle |Lep90 j for 
choosing this parameter in various situations. For practical applications there are still 
three open issues: 

• In the case of stochastic noise, one loses, in comparison to the optimal situation, a 
logarithmic factor; i.e. the proven convergence rate of the error is 0(S H log(5)) in 
comparison to an optimal 0(S H ) where £ = <5£ with a normalized £, H is depending 
on x and £. This phenomenon cannot be observed in practical implementations; 
the question is why? 



Applying Lepskij- Balancing in Practice 



3 



• In practical implementations, one can replace some knowledge needed explicitly in 
the proofs (the size of the regularized error in X) with a data-driven approximation 
without losing performance. Can this be put on a firm mathematical basis? 

• Is there a possibility to improve the speed of the method such that it can compete 
with others, e.g. the Morozov Discrepancy principle [EHN96[ IMo r66j? 

In order to explain some behavior observed using other parameter choice methods, in 
practical situations an alternative model for describing the solution and the noise has 
recently been proven successful [BR081 [BK08j . Using this model, we can answer the 
questions posed above by slightly modifying Lepskij 's algorithm such that we can prove 
an oracle inequality. 

The outline of the article is as follows. First we will cite the definition of the 
Lepskij Balancing principle. Then we will define our model and calculate the underlying 
expectations on whose basis we will estimate the probabilities that the balancing 
principle behaves differently than expected. This will yield the desired oracle inequality. 

Using the same methodology, we will show that an estimation based on two 
measurements is sufficient to obtain the same result, of course with weaker constants. 

2. Lepskij Balancing Principle 

The key point in the Lepskij Balancing Principle is the knowledge of the noise behavior, 
which has different forms for different noise regimes [GPOOi IMP031 IBP05] . 

Definition 2.1 (Noise Behavior) If£ is assumed to be in a deterministic regime (i.e., 
IICII ^ $)> then define 

Q (n) := S J n) 6 > \\A^\\ (6) 
where S is the noise level. If £ is assumed to be stochastic, then define 

e {nf := EUA-^II 2 . (7) 

Later on we will specify more precisely what we mean by stochastic. In both cases, g(-) 
is a monotonically increasing function. 

Now we will follow the approach presented in [BMP 7\ . which already incorporates the 
(minor) modifications of the balancing principle to make it fit for practice, in particular, 
by limiting the number of necessary computations. 

Definition 2.2 (Special parameters) There are two special regularization parame- 
ters which are important for the later proofs: 

• n op t: the optimal regularization parameter, i.e., we have \\A~^ Ax\\ ~ g(n op t). The 
parameter n opt is generally unknown. 

• N: the maximal regularization parameter, i.e., the point where one can be sure that 
in any case n opt < N. Even when one has just a very rough idea of the noise, 
respectively the noise level 5, this parameter can be estimated rather reliably. (E.g., 
in the deterministic case: N = g^ 1 ^), see [MP 03], for a statistical setup JMP06]). 
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However, assuming the knowledge of such a parameter N is problematic at some 
point; it is likely that a number of other parameter choice methods would work better 
if one were able to detect outliers easily. 

Definition 2.3 (Look- Ahead) Let a > 1. Define the look-ahead function by 
^N,a( n ) — min{min{m|^(n) _1 > ag^m)^ 1 }, N} 

Definition 2.4 (Balancing Functional) The balancing functional is defined as 
bN,a{n) = max |4 _1 ||x^ — x^||^(m) -1 } . 

n<m<ijv,<T (fi) 

The smoothed balancing functional is defined as 

B Nta (n) = max {b NjtT (m)} . (8) 

n<m<N 

Definition 2.5 (Balancing Stopping Index) The balancing stopping index is de- 
fined as 

n N)(T)K = min {B Na (n) < k} . (9) 

n<N 

If no ambiguities can occur, we will denote n^,a,K by n* 

Remark 2.6 A number of results and facts are known: 

• The classical proofs are for a = oc, i.e. £jv j0O (w) = N. However, reducing a just 
worsens some constants. 

• In the case of deterministic noise, k = 1 . Then it holds IMPOStf 

\\x - x{ t || < c (\\A- l Ax nopt || + g(n opt )) 

where c is independent of x and £. 

• In the case of stochastic noise and k = g{N) it holds [GPOO] 

V E II^-<H 2 < c\og(g(N)) (H^Ar^H + g(n opt )) 

where c independent of x and £. 

• These results are basically independent of the regularization method, i.e. they also 
apply to other well known methods like Tikhonov regularization and Landweber 
iteration IMPOS^ . 

• Similar results hold for non-linear inverse problems in combination with the 
Iteratively Regularized Gaufi-Newton Method (IRGNM) WHMOty . 
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3. A Closer Analysis 

In order to analyze the behavior of the methods in practice, we will now use the Bayesian 
model introduced in [BR08] . 

~JV(0, (Vk' 1 ) 2 ) 
5 k = k- x 
(£,v k )~ AT(0, (5k £ ) 2 ) 

where 7>1/2,A>0, A> — e and all Gaussian random variables are independent 
and identically distributed (iid). All expectations E should now be interpreted as joint 
expectations of x and £. 

3.1. Spectral Cut- Off Regularization 

Definition 3.1 (Subsampling) Let ujq > 1, ou > 1 and ujquj > ujq + 1. We choose the 
following subsampling for obtaining the regularization parameter: 

s(n) = \uj uj n ] 

Remark 3.2 Due to ujquj > uJq + 1 it always holds s(n + 1) > s(n). Furthermore we 
have 

UoU n < s(n) < — UoUJ n 

Basic calculus using upper and lower sums to approximate an integral yields 
Lemma 3.3 Let m/oo > n > ujq. If k > 1 then 

i m— 1 /, . i \ — k+1 i 

(l - ^n-^ 1 < k~ K < ( ^-^) —-n~ K - ] 

If k > then 

~ ' ' " ! 1 1 - oj- k - ] ) — „/ ,+1 ■ £ //' < ^— ///" ' 1 

7 1 — h At 



K - 1 fn V CJ 7 K - 1 



, , 1 \ K + 1 1 TO — 1 I 

ujo J V + K 1 + 



Corollary 3.4 (Adjacent Difference) Le£ < n < m. JTien holds 

/„2. ,-27+1 X2, ,l+2A+2e \ 

f ^ ^0 ^n(-2 7 +l) , U „rn(2A+2 £ +l) \ 

V 2 7 - 1 1 + 2A + 2e J 

/„2. ,-2 7 +l r2. ,l+2A+2e \ 
< E\\x S m - X S J 2 < C 2 ^ w "(-27+D + w to(2A +2£+ 1) (1Q) 

- 11 m nH - I 27- 1 l + 2A + 2e / 
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^ + l\- 27+1 



Ci = mm 



w 



1 - co~^ +1 ) 



/u -l\ 2X+2£+1 

V W 



w 



-2A-2e- 



c 2 = max 



w -l \-^ +1 / w + l \ 1+2A+2£ ] 

w / ' V W / J 



Proof 

It holds 



and hence 



s(m) — 1 



k=s(n) 



s(m)— 1 

eii^-4ii 2 = e ^~ 27 + <^ 2A+2£ 

fc=s(n) 



and hence 



w + l\- 27+1 



w 

+ 



2, ,-2 7 +l 



! 27- 1 



-w 



n(-2 7 +l) 



W - 1 



2A+2e+l 



W 



-2A-2e-A ^ W 



W 

<E\\x s m -x s n \\ 2 < 



2, ,l+2A+2e 



-W 



m(2A+2£+l) 



1 + 2A + 2e 

i \ -2 T +1 „2, ,-27+1 



w 



2 7 - 1 



-w 



n(-2 7 +l) 



+ 



w 



■y ^ l+2A+2e ^2 w l+2A+2e 



W 



-w 



m(2A+2e+l) 



1 + 2A + 2e 



which yields the proposition. 

Corollary 3.5 (Propagated Noise) Let < n < m. Then it holds 



2, ,l+2A+2e 



<5 2 w 



c 3 







1 + 2A + 2e 



w 



m (2A+2 £ +l) < Q{m) 2 < Q 



2, ,l+2A+2e 



5 2 w 



o 



1 + 2A + 2e 



w 



m(2A+2e+l) 



c 3 
c 4 



w 



_ y n 2A+2e+l 



J (l-w- 2 ^- 1 ) 



w 

wq + 1A 1+2A+2£ 
w 



Proof 

Using 



s(m)— 1 

^(m) 2 = E|| a ;i-a; m || 2 = £ <5 2 A; 2A + 2£ 

fe=i 



we can proceed as beforehand. 
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Corollary 3.6 (Regularization Error) Let < n < m. Then it holds 

/„2, ,-27+1 x2, ,l+2A+2e \ 

/ V ^0 n (-2 7+ l) ^0 n(2A+2 £ +l) \ 

5 V 2 7 - 1 1 + 2A + 2e J 

/„2, ,-27+1 £2, ,l+2A+2 £ \ 

< E\\x s n -xf< c 6 r-^Y^^ + ! + °A + 2^ " (2A+2£+1) ) (12) 



with 



/, , _l_ 1 \ -27+1 /, , 1 \ 2A+2e+l 

^O; + l\ M . ,-2 7 +l\ ^0 ~ 1 \ A _ , ,-2A-2e-l 



C6 = max 



/w -l\" 27+1 /wo + 1 



l+2A+2e' 



Proof 

Using 



oo s(n)-l 

E||4-a;|| 2 = £ ^-^4- £ 5 2 A; 2A+2£ 

k=s(n) k=l 

we can proceed as beforehand. □ 

Remark 3.7 Obviously it holds c\ = C5 < C3 < 1 < C4 < C2 = cq where we can get as 
close to 1 as we want, as long as for fixed co the constant ojq is big enough. 

Although this constant uj will have large influence in the latter proofs we cannot 
observe in practice fBLl(^ any major influence; ujq = 3 seems to be sufficient in most 
situations even when oj is rather close to 1. 

As ujq is independent of the noise level 5 we have that at least all proofs hold 
asymptotically. An explication for the insensitivity in practice towards 7 and the other 
parameters might be that our inequalities to handle the probabilities are too conservative. 



Now we can approximately determine the expected minimal point for E||a;* — x 



|2. 



ElK-zJ =E\\x° n -x 



|2 



which yields 



i.e., 



and hence 



„2, ,-2 7 +l Z2. ,2A+2e+l 

1 ^0 „n opt (-2 7 +l) _ ^0 UJ n opt (2X+2e+l) 

2 7 -l " 2A + 2s + 1 1 ; 



27 - 1 v y ' 2A + 2e + 1 

l/(2A+2e+2 7 ) 



s(n opt ) 



rf 2 7 -l 
P2\ + 2s + 1 
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respectively 

((ri 2 2 7 -l \ 1 /(2A+2 £ +2 7 ) \ 

n opt = log|^ 2A + 2£ + 1 j Wo - j /logo; 

Obviously n OJrf does not need to exist if w is getting too big. However, for the rest of 
the article we will assume the existence of n opt as there exists (depending on u ) a 5 
such that n opt exists for any 5 < 8 . 
Additionally, it holds 

lN,cr(n) =n + K 

for some fixed K « log(cr)/ log(w). Furthermore, we have a lemma which was proven in 
|BR08j . 

Lemma 3.8 Let Z = T,T=i a l(k T,kLi a l = 1 an d Cfc ~ N(0, 1) iid. Assume that 
maxfc ak > 0. JTien 



V* G (0, 1) : P(Z < 2) < exp % l j < (e^)^ 1 ^ (14) 

V 2 maxfc / 



1 - z + log(z) 
2 maxfe af, 

Vz > : P(Z > z) < V2e~ z/ \ (15) 
Now we will evaluate the probabilities. 

Lemma 3.9 Assume that n opt < n and that uq is big enough such that 

C -*>1 (16) 

c 6 - 2 1 ; 



Then it holds that 



and 



¥{B N , a (n) >t}< K \ g "° V2e- T \ 

—A log a; 
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It holds due to (flQl). (fTTT) and ([To]) 



P{6^(n) > r} < £ P{4" 1 ||4-ai f J e (n + fc)- 1 >T} 
i<fc<i^ 

< K max Pj^llx^ + A;)" 1 > r} 



K max P { Jf" > 16r 2 e(n + ^ 



i<fe<x \E\\x s n -x s n+k \\2 E\\x s n -x s n+k \\ 2 j 

E3ffD f|| T 5_ T 5 112 c ^ 2 ^ A+2£+1 (n+fc)(2A+2 £ +l) 

£ max P jpL V' > 16r 2 ° 3 



l<k<K |E||<-< +fe || 2 2C 6 CJ (n+fc)(2A+2 £ +l) 

° 2A+2e+l 

< K max P < — : ; j- — — > 4r } 

i<k<K I E\\xt - xi 

03 r- a 



The second inequality follows directly, using that any s(N) 7 < S does not make any 
sense. □ 

Lemma 3.10 Assume that it holds n opt > n, with ojq big enough such that 

CiOJqOJ 



2 7 - 1 

and 



> 1 (17) 



— < 2 (18) 

TTien it holds that 

F{b N , a (n) <r}< 32e u K ^ x+2 ^r 2 u- (n ^- n)(2X+2£+2 ^ 

where T is independent of n and linearly dependent on r; ul > u is independent of n and 
linearly dependent on oj. Furthermore, it holds 



F{B Ni(T (n) <r}< 32e ^ 2X+2£+1 h 2 uj-^ t - n ^ 2X+2e+2 ^ 
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Proof 

It holds due to (jIDI). ([IT]). (jT5l) and 03]) 

P{V-W < r} < P {Vi< fc <* : ^Har* - x 5 n+k \\g{n + k)' 1 < r] 
< mm K F [A-^xi - x 5 n+k \\g(n + k)' 1 < r} 

( { \ X n~~ X n+k\\ 2 ^i^_2 ()(n + k) 2 \ 




mill P \ — — ; ? — — ■ w < — < 

i<fc<^ |E||a£-a;* + J 2 E||z* -x^. fc |r J 

HDHHJ fllx^-X 5 II 2 ^ f f +2E+1 (n+fc)(2A+2 £ +l) 



I '.- A i ^11./-'.' - .(■'.' , ,.11- _ ^o 27+1 , ,n(-2 7 +l) 

u l 27- 1 M 

V min p J Jfn gn+feL < 16r 2 - "t 2A+2£+1 



l<ft<k I Elb* - ^L-JP ' " <S 2 a>g* +2s+1 „ ,(2A+2e+l), ,(-^+l)(n-nyl 

01 2A+2e+l ^ 



= mill pj II g " ^n+fcll^ < r 2 16c 4 _-(n OKt -»)(2A+2 £ +2 7 )^,fc(2A+2 £ +l) 

i<fc<^ \E||x* -x* +fc || 2 Ci 

JTHt f M - -r- 5 II 2 1 

V min P 11 " " +fc " < 3 2r 2 w -(™^-™)( 2 ^+ 2 7)^(2A+2 e+ i) [ 

i<fc<x E x* - xi 112 



y n+k I 



9 min f 32 er 2 w-( n ^-")( 2A+2£+2 ^u; fc(2A+2£+1) 



Kfc<X 



„2 -2 7 +l 



(32eu/ 2A+2e+1 V 



2 u] -{n op t-n)(2\+2e+2 1 )\ 2 7 



JTTt 

< 32ew (2A+2£+1) r 2 u; _(nopt_n){2A+2£+27) . 
The second inequality is trivial. □ 

This means that the balancing functional &jv j(J , respectively its smoothed version B^ }(7 , 
shows the following behavior: 

• Assume n < n opt . The probability that b(-) falls below the threshold becomes 
smaller and smaller the farther away n is from n opt ; near n opt , one cannot make 
any sensible statements as in the above inequality the bound for the probability is 
bigger than 1. In particular, the decay of probabilities is faster than the increase 
of error for smaller regularization parameters. 

• Besides the point n opt , the probability of being above the threshold depends only 
on the level of the threshold. 

Using this behavior, we can define the following method. This idea has already been 
presented in a different form in [RH08j . however in a purely deterministic setting with 
a focus on convergence results. 

Definition 3.11 (Fast Balancing) Define 

n fb = argmin{6 A r i(T (?2) < r}. 
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Theorem 3.12 Let o such that K = 1 and assume that uj is big enough such that 
(ESP, and $W\) hold; furthermore assume that n opt exists. 

For any N (including N = oo) and any r > \, 2A + 2e > 0, the parameter Ufb 
exists with probability 1 and it holds the oracle inequality 

E\\xi - x\\ 2 < CminElIx* - x\\ 2 

n rift, ii n ii n ii 

where C is not dependent on the particular x and £ (i.e., not on 6 resp. rj). 

The proof we use is rather similar to the one used in [BR08J : 
Proof 

The proof consists of three parts: 

Due to K = 1, all random variables bN,a{n) are independent. Hence, using lemma 
(pES) it holds that 

P(n > n opt + k)< (v / 2e" 1 ) fc 

as, due to the choice K = 1, all random variables bf^ jCT (n) are independent. This trivially 
yields that n/& exists with probability 1. 

Hence we obtain using the Holder inequality with p~ x + pT x = 1 

oo 

^ll^nfi, — -^11 = E E||x — X n \\ l n=rlfb 



n=0 

^opt ' 



-2 



n=0 



< 53 (e||*-4P) /p (ei^ : 

+ max (Ellx - x£ , II 2 , Ella; - II 2 } 

^ ii it-apt J- ii 7 'I it>opt*i J 

+ E (e||*-4II 2p ) /p (ei£ 



1 /P \ !/P 

"/by 

n=n opi + l 

In |BR08j it is proven using the Gaussian behavior that 

(E||x - x s J 2p ) 1/P < c p E\\x - x 5 J 2 (19) 

for some constant c p > 1 depending only on p. Now using that A > — e we can choose 
p near enough to 1 such that 

2A + 2e + 2 7 (l -p) +p> (20) 
and furthermore assume that r in relation to oj was chosen in such a way that 

,2A +2£+ i (V2e-^) 1/? < 1. (21) 
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Using lemmas [3.91 and I3.1UI 

riopt-2 

n fb 



E||4 -z|| 2 < Cp E E||x-4|| 2 (P{6 Ar , <T (n) <t}) 1/p 



n=Q 

+ J-^E\\x-xt f 

1 1 ,i opt 1 1 

+ c p E (E||x-x^|| 2 ) [ J] P(WW>^}' 

n=n op t+l \k=n op t+l 



< 2c„E||x-xf "" 



''Opt 1 



•-opt 



-2 



E (32ew 2A+2e+1 r 2 w _(n ° pt_n)(2A+2£+27) ] 1/P c 6 C5 1 w _(n ° pt_ri)(_27+1) 

n=0 

+ CJ (-27+D 

+ f] CJ («-«o pt )(2A+2 £ +l) r^ e - T 2^(n-n opt )/p 
n=n opt +l 



2 II*" ""nopt I 



< — Ellar-xf 



< C minElla;,, — x 



2 



due to the definition of n opt where 



123,(233 /, xi/p , , 

C < 4c p ( (32eo; 2A+2£+1 T 2 ) '* c«%\l _ ^A+2 £ +2 7 (i-p) +?) -i/p 



+^(- 27+1 ) + (l-^ 2A+2£+1 (72V 



Obviously C is independent of the particular x and £. □ 

This means in particular that we do not lose a logarithmic factor and can set r = 1 
without a problem as long as we keep u small enough. Furthermore, this speeds up the 
method considerably since, as in the Morozov discrepancy principle, we no longer need to 
find solutions for all n up to N but can stop after considering at most ri* + K rs n opt + K 
solutions. Practice shows that the method works also for K > 1 and even becomes more 
stable; however the proof would be unnecessarily complicated. 



4. Obtaining the Noise Behavior 

In practice, one often does not know g and therefore needs to estimate it. Nevertheless, 
in most practical situations it is possible to measure more than once or to partition the 
data into two or more data sets. 

Assume that one can partition the measurement in two parts yf and y% with 
5 = V25, we have 

?yS I rp 3 
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The estimate of g is now 

g{n) 

and it obviously holds 

Eg(n) 2 = g(n) 2 (22) 

Accordingly, we can define b^ iCT (n) by just replacing g with g. 

This means that we can modify the probability estimations using a similar trick as 
in |BR08] . It is important to notice that there is no way to reliably estimate the color of 
the noise based on only two solutions; the same holds for the noise level when the color 
of the noise is not known. Nevertheless, the information we obtain from two solutions 
is sufficient for optimal reconstructions. 

Lemma 4.1 Assume that n op t < n and that ujq is big enough such that 

£ 3 > 1 

c 6 - 2 



and 



Then it holds if < 1 that 



C3U0 > 1 (23) 



1 + 2A + 2e 



Hb N A n ) >r}<K- 

T 



Proof 

Using ( 1221) . (lllj) . lemma I3T81 and parts which have already been shown in 13.91 it holds: 

K- l ¥{b N ^n)>T} <K- 1 P { 4 " 1 H4-4+fell^ + fc )" 1 

l<fe<A' 

< ma* : P {^K - x s n+k \\g(n + k)' 1 > r} 



Kk<K 

^ 1C 2 g{n + k) 2 g{n + k) 2 



max P " + ' > 1 6r 2 - , , < „, 

i<k<K \E\\x & n -x 5 n+k \\ 2 E\\x s n -x s n+k \\ 2 g(n + ky 

< max P <^ drr-^ , ' > 4r 2 v 1 



i<fc<^ lllK-s'+J 2 Eg{n + k) 2 ) 

{ 114 1 ™ f 1 ^(n + fc) 2 

< max P ^ > 2r I + P ^ — > ,\ 

- i<fc<x (E iJ-i^ P J \2r Eg(n + k) 2 



< \/2V t/2 + max , 

i<fc<^" \2r 



^2 l+2A+2c 

™lJf!J w (n+*)(2A+2s+l) 

■ j 1+2A+2E 

<5 2 1J 2A+2 e ^(„ + fc)(2A+2£) 



I— ^/n I e \ C3 l+2A+2e 

< \/2e- r/2 + ( — 

AT 

e 

2r 



< V2e- T / 2 + ^<-. □ 
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Lemma 4.2 Assume that it holds n opt > n and assume that u is big enough such that 

2 7 - 1 ~ 

and 

Ca 

— < 2 

c\ 

Then it holds that 

P{WH <r}< Q4eu 2X+2£+1 Tu- {n ° pt ~ n){x+2£+2 ^ 

Proof 

It holds using lemma 13.81 and parts of lemma 13.101 

PfcW < r ) < P{Vi<fe<A' : 4- 1 ||4 - x s n+k \\g(n + k)' 1 < r} 

< ^nin^P {4- 1 ||< - x s n+k \\g(n + k)' 1 < r} 

m min p f ll4-4 +fc l| 2 ^ Wr 2 Q(n + k) 2 g(n + k) 2 \ 
i<k<K \E\\x s n - x 5 n+k \\ 2 E\\x s n - x s n+k \\ 2 g{n + kf J 

^t^p J W X n ~ X n+ll! 2 < 32 r 2 ^-( w °pt-n)(2A+2£+27)^,fc(2A+2£+l) + 

~ \E\\xi-x s n+l \\ 2 ~ Eg{n + 1)\ 

< p / ll X n ~ X n+ll| 2 22 rw -( rt °Pt- n )( A + 2£ + 2 7) CJ fc (2A+2e+l)l > 

" \E||^-x^ +1 || 2 ^ J 

£?(n + l) 2 1 



+ P I TU {n ° pt - n)X < 



Eg(n + l) 2 



32eu (2X+2e+l) TU} -(n opt -n)(X+2e+2 1 ) + ^-r^""^ 
< Q4 :euJ 2X + 2£ + 1 rU j-( n o P t~n)(\+2e+2~ f ) Q 

This means that, in principle, the balancing functional b^ <a shows the same behavior as 
its non-estimated counterpart 6jv )(T . 

Using this behavior, we can define a version of the new method: 

Definition 4.3 (Fast Balancing) Define 

nfb = aigmm{b N ^(n) < r}. 

n 

Theorem 4.4 Let a such that K = 1 and assume that ujq is big enough such that flhl) . 
(TIT]). IHTS\) and hold; furthermore assume that n opt exists. 

For any N (including N = 00) and any r > 1, A + 2e > 0, the parameter njb exists 
with probability 1 and it holds the oracle inequality 

E\\xi - x\\ 2 < C mmE\\xi - x\\ 2 

where C is not dependent on the particular x and £ (i.e., not on 6 resp. rj). 
The proof works in the exact same way as for theorem 13.121 
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5. Conclusion 

Assuming that our model is suitable for describing real data, we have presented an 
answer to the initial questions, at least for the newly defined methods: 

• We do not lose a logarithmic factor, because the probability of the balancing 
principle going completely wrong is negligibly small. 

• We do not need explicit knowledge of the noise level 5 and the noise behavior. A 
rough estimation based on two independent measurements is sufficient. 

• The newly introduced method is as fast as the Morozov discrepancy principle (if 
one neglects constant factors). 

Although the situation is not completely comparable with the case of deterministic 
x which suffers from the mentioned logarithmic factor we think this is a significant 
advance to understand the difference in theoretical and actual behavior of the balancing 
principle. Though it has not been shown in this paper, one can transfer parts of the 
proofs also to the case of Tikhonov regularization [BaulOj . 

Furthermore, large numerical experiments show that the newly defined method 
works very well and can, in contrast to most other parameter choice regimes, cope 
with colored noise without any performance loss |BL10j . In these experiments it was 
observed that the factor C in the oracle inequality is at most around 2. The method 
is very stable, i.e., the number of observed outliers is very low, both for Tikhonov and 
Spectral-Cut-Off regularization. 

Additionally it was observed that the stability increases if one uses more than two 
measurements in order to estimate the noise behavior and if one chooses K a bit bigger 
than 1. 
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